AITopics | Morgan County

Collaborating Authors

Morgan County

Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

arXiv.org Artificial IntelligenceAug-14-2023

Our test sets were too small and too haphazard to support statistically valid conclusions, but they were suggestive of a number of conclusions. We summarize these here, and discuss them at greater length in section 7. Over the kinds of problems tested, GPT-4 with either plug-in is significantly stronger than GPT-4 by itself, or, almost certainly, than any AI that existed a year ago. However it is still far from reliable; it often outputs a wrong answer or fails to output any answer. In terms of overall score, we would judge that these systems performs on the level of a middling undergraduate student. However, their capacities and weaknesses do not align with a human student; the systems solve some problems that even capable students would find challenging, whereas they fail on some problems that even middling high school students would find easy.

calculation, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2308.05713

Country:

North America > United States > Michigan (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > Quebec (0.04)
(40 more...)

Genre: Research Report (0.41)

Industry: Education > Educational Setting > K-12 Education > Secondary School (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback